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We are interested in understanding the relationship between 
Bayesian inference and evidence theory. The concept of a set of 
probability distributions is central both in robust Bayesian analy- 
sis and in some versions of Dempster-Shafer’s evidence theory. We 
interpret imprecise probabilities as imprecise posteriors obtainable 
from imprecise likelihoods and priors, both of which are convex 
sets that can be considered as evidence and represented with, e.g., 
DS-structures. Likelihoods and prior are in Bayesian analysis com- 
bined with Laplace’s parallel composition. The natural and simple 
robust combination operator makes all pairwise combinations of 
elements from the two sets representing prior and likelihood. Our 
proposed combination operator is unique, and it has interesting 
normative and factual properties. We compare its behavior with 
other proposed fusion rules, and earlier efforts to reconcile Bayesian 
analysis and evidence theory. The behavior of the robust rule is con- 
sistent with the behavior of Fixsen/Mahler’s modified Dempster’s 
(MDS) rule, but not with Dempster’s rule. The Bayesian frame- 
work is liberal in allowing all significant uncertainty concepts to be 
modeled and taken care of and is therefore a viable, but probably 
not the only, unifying structure that can be economically taught 
and in which alternative solutions can be modeled, compared and 
explained. 
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Several, apparently incomparable, approaches exist 
for uncertainty management. Uncertainty management 
is a broad area applied in many different fields, where 
information about some underlying, not directly observ- 
able, truth — the state of the world — is sought from a 
set of observations that are more or less reliable. These 
observations can be, for example, measurements with 
random and/or systematic errors, sensor readings, or re- 
ports submitted by observers. In order that conclusions 
about the conditions of interest be possible, there must 
be some assumptions made on how the observations re- 
late to the underlying state about which information is 
sought. Most such assumptions are numerical in nature, 
giving a measure that indicates how plausible different 
underlying states are. Such measures can usually be nor- 
malized so that the end result looks very much like a 
probability distribution over the possible states of the 
world, or over sets of possible world states. However, 
uncertainty management and information fusion is often 
concerned with complex technical, social or biological 
systems that are incompletely understood, and it would 
be naive to think that the relationship between observa- 
tion and state can be completely captured. At the same 
time, such systems must have at least some approximate 
ways to relate observation with state in order to make 
uncertainty management at all possible. 

It has been a goal in research to encompass all 
aspects of uncertainty management in a single frame- 
work. Attaining this goal should make the topic teach- 
able in undergraduate and graduate engineering cur- 
ricula and facilitate engineering applications develop- 
ment. We propose here that robust Bayesian analysis is 
such a framework. The Dempster-Shafer or evidence 
theory originated within Bayesian statistical analysis 
[19], but when developed by Shafer [51] took the con- 
cept of belief assignment rather than probability dis- 
tribution as primitive. The assumption being that bod- 
ies of evidence — beliefs about the possible worlds of 
interest — can be taken as primitives rather than sam- 
pling functions and priors. Although this idea has had 
considerable popularity, it is inherently dangerous since 
it seems to move application away from foundational 
justification. When the connection to Bayes’ method 
and Dempster’s application model is broken, it is no 
longer necessary to use the Dempster combination rule, 
and evidence theory abounds with proposals on how 
bodies of evidence should be interpreted and combined, 
as a rule with convincing but disparate argumentation. 
But there seems not to exist other bases for obtain- 
ing bodies of evidence than likelihoods and priors, and 
therefore an analysis of a hypothetical Bayesian obtain- 
ment of bodies of evidence can bring light to problems 
in evidence theory. Particularly, a body of evidence rep- 
resented by a DS-structure has an interpretation as a set 
of possible probability distributions, and combining or 
aggregating two such structures can be done in robust 
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Bayesian analysis. The resulting combination operator 
is trivial, but compared to other similar operators it has 
interesting, even surprising, behavior and normative ad- 
vantages. Some concrete progress in working with con- 
vex sets of probability vectors has been described in 
[41, 57, 29]. It appears that the robust combination op- 
erator we discuss has not been analyzed in detail and 
compared to its alternatives, and is missing in recent 
overviews of evidence and imprecise probability the- 
ory. Our ideas are closely related to problems discussed 
in [32] and in the recent and voluminous report [21], 
which also contains a quite comprehensive bibliogra- 
phy. The Workshop hosted by the SANDIA lab has 
resulted in an overview of current probabilistic uncer- 
tainty management methods [34]. A current overview of 
alternative fusion and estimation operators for tracking 
and classification is given in [45]. 

The main objective of this paper is to propose that 
precise and robust Bayesian analysis are unifying, sim- 
ple and viable methods for information fusion, and that 
the large number of methods possible can and should 
be evaluated by taking into account the appropriateness 
of statistical models chosen in the particular applica- 
tion where it is used. We are aware, however, that the 
construction of Bayesian analysis as a unifying concept 
has no objective truth. It is meant as a post-modernistic 
project facilitating teaching and returning artistic free- 
dom to objective science. The Bayesian method is so lib- 
eral that it almost never provides unique exact solutions 
to inference and fusion problems, but is completely 
dependent on insightful modeling. The main obstacle 
to achieving acceptance of the main objective seems 
to be the somewhat antagonistic relationship between 
the different schools where sometimes sweeping argu- 
ments have been made that seem rather unfair whoever 
launched them, typical examples being [42, 51] and the 
discussions following them. 

Another objective is to investigate the appropriate- 
ness of particular fusion and estimation operations, and 
their relationships to the robust as well as the precise 
Bayesian concept. Specifically, we show that the choice 
between different fusion and estimation operations can 
be guided by a Bayesian investigation of the application. 

We also want to connect the analysis to practical 
concerns in information fusion and keep the mathemat- 
ical/theoretical level of the presentation as simple as 
possible, while also examining the problem to its full 
depth. A quite related paper promoting similar ideas is 
Mahler [43], which however is terser and uses some- 
what heavier mathematical machinery. 

Quite many comparisons have been made of Bayes- 
ian and evidential reasoning with the objective of guid- 
ing practice, among others [47, 10, 11, 50], It is gen- 
erally found that the methods are different and there- 
fore one should choose a method that matches the ap- 
plication in terms of quantities available (evidence or 
likelihoods and priors), or the prevailing culture and 
construction of the application. Although the easiest 



way forward, this advice seems somewhat short-sighted 
given the quite large lifespan of typical advanced ap- 
plications and the significant changes in understanding 
and availability of all kinds of data during this life-span. 

In Section 2 we review Bayesian analysis and in 
Section 3 dynamic Bayesian (Chapman Kolmogorov/ 
Kalman) analysis. In Section 4 we describe robust 
Bayesian analysis analysis and some of its relations to 
DS theory; in Section 5 we discuss decisions under un- 
certainty and imprecision and in Section 6 Zadeh’s well- 
known example. In Section 7 we derive some evidence 
fusion operations and the robust combination operator. 
We illustrate their performance on a paradoxical exam- 
ple related to Zadeh’s in Section 8, and wrap up with 
conclusions in Section 9. 

2. BAYESIAN ANALYSIS 

Bayesian analysis is usually explained [7, 38, 52, 
24] using the formula 

/(A | x) oc /(x | A)/ (A) (1) 

where A € A is the world of interest among n = |A| pos- 
sible worlds (sometimes called parameter space), and 
x £ X is an observation among possible observations. 
The distinction between observation and world space is 
not necessary but is convenient — it indicates what our 
inputs are (observations) and what our outputs are (be- 
lief about possible worlds). The functions in the formula 
are probability distributions, discrete or continuous. We 
use a generic function notation common in statistics, 
so the different occurrences of / denote different func- 
tions suggested by their arguments. The sign oc indicates 
that the left side is proportional to the right side (as a 
function of A), with the normalization constant left out. 
In (1), fix | A) is a sampling distribution, or likelihood 
when regarded as a function of A for a given x, which 
connects observation space and possible world space 
by giving a probability distribution of observed value 
for each possible world, and /(A) is a prior describing 
our expectation on what the world might be. The rule 
(1) gives the posterior distribution /(A | x) over possi- 
ble worlds A conditional on observations x. A paradox 
arises if the supports of /(A) and f(x | A) are disjoint 
(since each possible world is ruled out either by the 
prior or by the likelihood), a possibility we will ignore 
throughout this paper. Equation (1) is free of technical 
complication and easily explainable. It generalizes how- 
ever to surprisingly complex settings, as required of any 
device helpful in design of complex technical systems. 
In such systems, it is possible that x represents a quantity 
which is not immediately observable, but instead our in- 
formation about x is given by a probability distribution 
/(x), typically obtained as a posterior from (1). Such 
observations are sometimes called fuzzy observations. 
In this case, instead of using (1) we apply: 

/(A | fix)) oc J fix | X)fix)fiX)dx. (2) 
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Ed Jaynes made (1) the basis for teaching science 
and interpretation of measurements [38]. In general, for 
infinite (compact metric) observation spaces or possi- 
ble world sets, some measure-theoretic caution is called 
for, but it is also possible to base the analysis on well- 
behaved limit processes in each case as pointed out by, 
among others, Jaynes [38]. We will here follow Jaynes’ 
approach and thus discuss only the finite case. That 
generalization to infinite and/or complexly structured 
unions of spaces of different dimensions and quotiented 
over symmetry relations is possible is known although 
maybe not obvious. Mahler claims that such applica- 
tions are not Bayesian in [43], but they can apparently 
be described by (1) and similar problems are investi- 
gated within the Bayesian framework, for example by 
Green [26]. Needless to say, since the observation and 
world spaces can be high-dimensional and the prior and 
likelihood can be arbitrarily complex, practical work 
with (1) is full of pitfalls and one often encounters what 
looks like counterintuitive behaviors. On closer inves- 
tigation, such problems can lead to finding a modeling 
error, but more often it shows that (1) is indeed better 
than one’s first intuitive attitude. 

It has been an important philosophical question to 
characterize the scope of applicability of (1), which lead 
to the distinction between objective and subjective prob- 
ability, among other things. Several books and papers, 
among others [17, 49, 42, 15], claim that, under rea- 
sonable assumptions, (1) is the only consistent basis 
for uncertainty management. However, the minimal as- 
sumptions truly required to obtain this result turn out 
on closer inspection to be rather complex, as discussed 
in [7, 64, 33, 31, 46, 35, 2]. One simple assumption 
usually made in those studies that conclude in favor of 
(1) is that uncertainty is measured by a real number 
or on an ordered scale. Many established uncertainty 
management methods however measure uncertainty on 
a partially ordered scale and do apparently not use (1) 
and the accompanying philosophy. Among probability 
based alternatives to Bayesian analysis with partially 
ordered uncertainty concepts are imprecise probabili- 
ties or lower/upper prevision theory [62], the Dempster- 
Shafer (DS) [51], the Fixsen/Mahler (MDS) [22] and 
Dezert-Smarandache (DSmT) [53] theories. In these 
schools, it is considered important to develop the the- 
ory without reference to classical Bayesian thinking. 
In particular, the assumption of precise prior and sam- 
pling distributions is considered indefensible. Those as- 
sumptions are referred to as the dogma of precision in 
Bayesian analysis [63]. 

Indeed, when the inference process is widened from 
an individual to a social or multi-agent context, there 
must be ways to accommodate different assessments of 
priors and likelihoods. Thus, there is a possibility that 
two experts make the same inference using different 
likelihoods and priors. If expert 1 obtained observa- 
tion set Xj CX and expert 2 obtained observation set 
X 2 Q X, they would obtain a posterior belief of, e.g., 



a patient’s condition expressible as f i (X i Xj) ex f i (X i \ 
\ for i = 1.2. Here we have not assumed that 
the two experts used the same sampling and prior distri- 
butions. Even if training aims at giving the two experts 
the same “knowledge” in the form of sampling function 
and prior, this ideal cannot be achieved completely in 
practice. The Bayesian method prescribes that expert i 
states the probability distribution /J(A, | X t ) as his belief 
about the patient. If they use the same sampling func- 
tion and prior, the Bayesian method also allows them to 
combine their findings to obtain: 

/(Al^.Xjjoc/dXj.^llAj/CA) 

= f(X i | A )f(X 2 | A)/ (A) (3) 

under the assumption: 

/({Xi, X 2 } | A) = f(X x | A )f(X 2 | A). 

The assumption appears reasonable in many cases. 
In cases where it is not, the discrepancy should be 
entered in the statistical model. This is particularly 
important in information fusion for those cases where 
the first set of observations was used to define the 
second investigation, as in sensor management. This 
is an instance of selection bias. Ways of handling data 
selection biases are discussed thoroughly in [24]. Data 
selection bias is naturally and closely related to the 
missing data problem that has profound importance in 
statistics [48] and has also been examined in depth in 
the context of imprecise probability fusion [16]. 

It is important to observe that it is the two experts 
likelihood functions, not their posterior beliefs, that can 
be combined, otherwise we would replace the prior by 
its normalized square and the real uncertainty would be 
underestimated. This is at least the case if the experts 
obtained their training from a common body of med- 
ical experience coded in textbooks. If the posterior is 
reported and we happen to know the prior, the likeli- 
hood can be obtained by f(X | A) oc /(A | X) //(A) and 
the fusion rule becomes 

/(A | X lt X 2 ) ex /(A | X x )f( A | X 2 )/f( A). (4) 

The existence of different agents with different pri- 
ors and likelihoods is maybe the most compelling argu- 
ment to open the possibility for robust Bayesian analy- 
sis, where the likelihood and prior sets would in the first 
approximation be the convex closure of the likelihoods 
and prior of different experts. 

3. WHAT IS REQUIRED FOR SUCCESSFUL 

APPLICATION OF BAYES METHOD? 

The formula (1) is deceptively simple, and hides 
the complexity of a real world application where many 
engineering compromises are inevitable. Nevertheless, 
any method claimed to be Bayesian must relate to (1) 
and include all substantive application knowledge in the 
parameter and observation spaces, the likelihood and the 
prior. It is in general quite easy to show the Bayesian 



ARNBORG: ROBUST BAYESIANISM: RELATION TO EVIDENCE THEORY 



77 




method to be better or worse than an alternative by not 
including relevant and necessary application knowledge 
in (1) or in the alternative method. Let us illustrate 
this by an analysis of the comparison made in [56]. 
The problem is to track and classify a single target. 
The tracking problem is solved with a dynamic version 
of Bayes method, known as the Bayesian Chapman- 
Kolmogorov relationship: 

f{\ | D t ) oc f (d t | A,) J /(A, | A,_,)/ (\_j | £>,_i)A-t 

/( A 0 I D 0 ) = /( Ao). (5) 

Here D t = ( d 1 , . . . , d t ) is the sequence of observations 
obtained at different times, and f(X t | A, , ) is the maneu- 
vering (process innovation) noise assumed. The latter is 
a probability distribution function (pdf) over state A r 
dependent on the state at the previous time-step, A ( _ , . 
When tracking targets that display different levels of 
maneuvering like transportation, attack and dog-fight 
for a fighter airplane, it has been found appropriate to 
apply (5) with different filters with levels of innovation 
noise corresponding to the maneuvering states, and to 
declare the maneuvering state that corresponds to the 
best matching filter. In the paper [56] the same method 
is proposed for a different purpose, namely the classi- 
fication of aircraft (civilian, bomber, fighter) based on 
their acceleration capabilities. This is done by ad hoc 
modifications of (5) that do not seem to reflect substan- 
tive application knowledge, namely that the true target 
class is unlikely to change, and hence does not work 
well. The Bayesian solution to this problem would in- 
volve looking at (5) with a critical mind. Since we want 
to jointly track and classify, the state space should be, 
e.g., P x V x C, where P and V are position and velocity 
spaces and C is the class set, {c,b,f}. The innovation 
process should take account of the facts that the target 
class in this case does not change, and that the civilian 
and bomber aircraft have bounded acceleration capaci- 
ties. This translates to two requirements on the process 
innovation component f{\ | A,_ , ) that (assuming unit 
time sampling): 

f((p,,v n c l ) | (p t _\ , v ( | , c t _ , )) = 0 if c t ^c t _ j 

f((p t ,v t ,k) | {p t _ x ,v t _ x ,k)) = 0 if \v t - v t _ , | > a k 

where a k is the highest possible acceleration of target 
class k. Such an innovation term can be (and often is) 
described by a Gaussian with variance tuned to a k , or 
by a bank of Gaussians. With this innovation term, the 
observation of a high acceleration dampens permanently 
the marginal probability of having a target class inca- 
pable of such acceleration. This is the natural Bayesian 
approach to the joint tracking and classification prob- 
lems. Similar effects can be obtained in the robust Bayes 
and TBM [56] frameworks. As a contrast, the experi- 
ments reported by Oxenham et al. [44] use an appro- 
priate innovation term and also give more reasonable 
results, both for the TBM and the Bayesian Chapman 



Kolmogorov approaches. The above is not meant as an 
argument that one of the two approaches compared in 
[56] is the preferred one. Our intention is rather to sug- 
gest that appropriate modeling may be beneficial for 
both approaches. 

The range of applications where an uncertainty man- 
agement problem is approached using (1) or (5) is ex- 
tremely broad. In the above example, the parameter A 
consists of one state vector (position and velocity vec- 
tors of a target) and its target label, thus the parameter 
space is (for 3D tracking) R 6 x C where C is a finite 
set of targets labels. In our main example, A is just an 
indicator with three possible values. In many image pro- 
cessing applications, the parameter A is the scene to be 
reconstructed from the data x, which is commonly called 
the film even if it is nowadays not registered on pho- 
tographic film and is not even necessarily represented 
as a 2D image. This approach has been found excellent 
both for ordinary camera reconstruction problems and 
for special types of cameras as exemplified by Positron 
Emission Tomography and functional Magnetic Reso- 
nance Imaging, the type of camera and reconstruction 
objective having a profound influence on the choice of 
likelihood and priors, see [3, 27]. In genetic investi- 
gations, complex Bayesian models are also used a lot, 
and here the parameter A could be a description of how 
reproduction in a set of individuals in a family has been 
produced by selection of chromosomes from parents, 
the positions of crossovers and the position of one or 
more hypothesized disease-causing gene(s), whereas the 
data are the genotypes and disease status of individuals, 
plus individual covariates that may environmentally in- 
fluence development of disease. For a unified treatment 
of this problem family, see [14]. Another fascinating ex- 
ample is Bayesian identification of state space dynamics 
in time series, where the parameter is the time series of 
invisible underlying states, a signaling distribution (out- 
put distribution as a function of latent state) and the state 
change probability distributions [59]. 

Characteristic of cases where (1) and (5) are not as 
easily accepted is the presence of two different kinds 
of uncertainty, often called aleatory and epistemic un- 
certainty, where the former can be called “pure ran- 
domness” as one perceives dice (Latin: alea) throw- 
ing, while the latter is caused by “lack of knowledge” 
(from the Greek word for knowledge, episteme). Al- 
though one can argue about the relevance of this distinc- 
tion, application owners have typically a strong sense 
of the distinction, particularly in risk assessment. The 
consequence is that the concepts of well-defined pri- 
ors and likelihoods can be, and have been, questioned. 
The Bayesian answer to this critique is robust Bayesian 
analysis. 

4. ROBUST BAYES AND EVIDENCE THEORY 

In (global) robust Bayesian analysis [5, 36], one ac- 
knowledges that there can be ambiguity about the prior 
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and sampling distributions, and it is accepted that a con- 
vex set of such distributions is used in inference. The 
idea of robust Bayesian analysis goes back to the pio- 
neers of Bayesian analysis [17, 39], but the computa- 
tional and conceptual complexities involved meant that 
it could not be fully developed in those days. Instead, 
a lot of effort went into the idea of finding a canonical 
and unique prior, an idea that seems to have failed ex- 
cept for finite problems with some kind of symmetry, 
where a natural generalization of Bernoulli's indiffer- 
ence principle has become accepted. The problem is that 
no proposed priors are invariant under arbitrary rescal- 
ing of numerical quantities or non-uniform coarsening 
or refinement of the current frame of discernment. The 
difficulty of finding precise and unique priors has been 
taken as an argument to use some other methods, like 
evidence theory. However, as we shall see, this is an illu- 
sion, and avoiding use of an explicit prior usually means 
implicit reliance on Bernoulli’ s principle of indifference 
anyway. Likewise, should there be an acceptable prior, 
it can and should be used both in evidence theory and 
in Bayesian theory. This was pointed out, e.g., in [6, 
ch. 3.4], 

Convex sets of probability distributions can be arbi- 
trarily complex. Such a set can be generated by mixing 
of a set of “corners” (called simplices in linear program- 
ming theory) and the set of corners can be arbitrarily 
large already for sets of probability distributions over 
three elements. 

In evidence theory, the DS-structure is a representa- 
tion of a belief over a frame of discernment (set of pos- 
sible worlds) A (commonly called the frame of discern- 
ment 0 in evidence theory) by a probability distribution 
m over its power-set (excluding the empty set), a ba- 
sic probability assignment bpa, basic belief assignment 
bba, bma, or DS-structure (terminology is not stable, 
we will use DS-structure). The sets assigned non-zero 
probability in a DS-structure are called its focal ele- 
ments, and those that are singletons are called atoms. A 
DS-structure with no mass assigned to non-atoms is a 
precise (sometimes called Bayesian) DS-structure. Even 
if it is considered important in many versions of DS the- 
ory not to equate a DS-structure with a set of possible 
distributions, such a perspective is prevalent in tutorials 
(e.g., [30, ch. 7] and [8, ch. 8]), explicit in Dempster’s 
work [18], and almost unavoidable in a teaching situa- 
tion. It is also compellingly suggested by the common 
phrase that the belief assigned to a non-singleton can 
flow freely to its singleton members, and the equiva- 
lence between a DS-structure with no mass assigned to 
non- singletons and the corresponding probability dis- 
tribution [55]. Among publications elaborating on the 
possible difference between probability and other nu- 
merical uncertainty measures are [32, 55, 20]. 

A DS-structure seen as a set of distributions is a 
type of Choquet capacity, and these capacities form 
a particularly concise and flexible family of sets of 
distributions (the full theory of Choquet capacities is 



rich and of no immediate importance for us — we use 
the term capacity interpretation only to indicate a set 
of distributions obtained from a DS-structure in a way 
we will define precisely). Interpreting DS-structures as 
sets of probability distributions entails saying that the 
probability of a union of outcomes ecA lies between 
the belief of e (J2wce m ( w ^ an d the plausibility of e 
C^2 w nejt0 m ( w ))- The parametric representation of the 
family of distributions it can represent, with parameters 
a ew , e £ 2 A , we A, is P{w) = J2 e a ew m ( e )’ a11 w e A, 
where a w = 0 if w <£ e, J2wee a ew = 1, and all a ew are 
non-negative. This representation is used in Blackman 
and Popoli [8, ch. 8.5.3], The pignistic transformation 
used in evidence theory to estimate a precise probability 
distribution from a DS-structure is obtained by making 
the a ew equal for each e, a ew = 1 /\e\ if wee. The 
relative plausibility transformation proposed by, among 
others, Voorbraak [60], Cobb and Shenoy [12, 13], 
on the other hand, is the result of normalizing the 
plausibilities of the atoms in A. It is also possible to 
translate a pdf over A to a DS-structure. Indeed, a pdf 
is already a (precise) DS-structure, but Sudano [58] 
studied inverse pignistic transformations that result in 
non-precise DS-structures by coarsening. They have 
considerable appeal but are not in the main line of 
argumentation in this paper [58]. 

It is illuminating to see how the pignistic and rel- 
ative plausibility transformations emerge from a pre- 
cise Bayesian inference: The observation space can in 
this case be considered to be 2 A , since this represents 
the only distinction among observation sets surviving 
from the likelihoods. The likelihood will be a func- 
tion l : 2 a x A — > A, the probability of seeing evidence 
e given world state A. Given a precise e £ 2 A as obser- 
vation and a uniform prior, the inference over A would 
be /(A | e) ex /(e, A), but since we in this case have a 
probability distribution over the observation space, we 
should use (2), weighting the likelihoods by the masses 
of the DS-structures. Applying the indifference princi- 
ple, l(e. A) should be constant for A varying over the 
members of e, for each e. The other likelihood values 
(A ^ e) will be zero. Two natural choices of likelihood 
are /,(e,A) oc 1 and l 2 (e, A) oc \/\e\, for A € e. Amazingly, 
these two choices lead to the relative plausibility trans- 
formation and to the pignistic transformation, respec- 
tively: 

f i {X\m)(x Y m(e)/,(e,A) 

{^iAgc} 

Y m(e ) / Y H m( r)’ * = 1 

_ {e:\ee} ' e 

Y m ( e V\ e l> i = 2 - 

(6) 

Despite a lot of discussion, there seems thus to exist 
no fundamental reason to prefer one to the other, since 
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they result from two different and completely plausi- 
ble statistical models and a common application of an 
indifference principle. The choice between the models 
(i.e., the two proposed likelihoods) can in principle be 
determined by (statistical) testing on the application's 
historic data. 

The capacity corresponding to a DS-structure can be 
represented by 2" — 2 real numbers — the corresponding 
DS-structure is a normalized distribution over 2” — 1 
elements (whereas an arbitrary convex set can need any 
number of distributions to span it and needs an arbitrary 
number of reals to represent it — thus capacities form 
a proper and really small subset of all convex sets of 
distributions). 

It is definitely possible — although we will not elab- 
orate it here — to introduce more complex but still con- 
sistent uncertainty management by going beyond robust 
Bayesianism, grading the families of distributions and 
introducing rules on how the grade of combined dis- 
tributions are obtained from the grades of their con- 
stituents. The grade would in some sense indicate how 
plausible a distribution in the set is. It seems however 
important to caution against unnecessarily diving into 
the more sophisticated robust and graded set approaches 
to Bayesian uncertainty management. 

Finally, in multi-agent systems we must consider 
the possibility of a gaming component, where an agent 
must be aware of the possible reasoning processes of 
other agents, and use information about their actions 
and goals to decide its own actions. In this case there 
appears to be no simple way to separate — as there is 
in a single agent setting — the uncertainty domain (what 
is happening?) from the decision domain (what shall I 
do?) because these get entangled by the uncertainties 
of what other agents will believe, desire and do. This 
problem is not addressed here, but can be approached 
by game-theoretic analyses, see, e.g., [9], 

A Bayesian data fusion system or subsystem can 
thus use any level in a ladder with increasing complex- 
ity: 

• Logic — no quantified uncertainty 

• Precise Bayesian fusion 

• Robust Bayesianism with DS-structures interpreted as 
capacities 

• General robust Bayesianism (or lower/upper previ- 
sions) 

• Robust Bayesianism with graded sets of distributions 

Whether or not this simplistic view (ladder of Bayes- 
ianisms) on uncertainty management is tenable in the 
long run in an educational or philosophical sense is 
currently not settled. We will not further consider the 
first and the last rungs of the ladder. 

4.1. Rounding 

A set of distributions which is not a capacity can 
be approximated by rounding it to a minimal capacity 



that contains it (see Fig. 1), and this rounded set can 
be represented by a DS-structure. This rounding “up- 
wards” is accomplished by means of lower probabili- 
ties (beliefs) of subsets of A. Specifically, in this ex- 
ample we list the minimum probabilities of all subsets 
of A = {A,B,C} over the four corners of the poly tope, 
to get lower bounds for the beliefs. These can be con- 
verted to masses using the Mobius inversion, or, in this 
simple example, manually from small to large events. 
For example, m(A) = bel(A), m({A,B}) = bel({A,fi}) 
- m(A) - m(B), and m{{A,B,C}) = bel({A,B,C}) - 
m({A,B})—m({A,C})—m({B,C})—m(A)—m(B)—m(C). 
Since we have not necessarily started with a capacity, 
this may give negative masses to some elements. In that 
case, some mass must be moved up in the lattice to make 
all masses non-negative, and this can in the general case 
be done in several ways, but each way gives a minimal 
enclosing polytope. In the example, we have four cor- 
ners, and the computation is shown in Table I. In this 
example we immediately obtain non-negative masses, 
and the rounded polytope is thus unique. 

In the resulting up-rounded bba, when transforming 
it to a capacity, we must consider 2*2*3 = 12 possible 
corner points. However, only five of these are actually 
corners of the convex hull in this case, and those are the 
corners visible in the enclosing capacity of Fig. 1 . The 
other possible corner points turn out to lie inside, or 
inside the facets of, the convex hull. As an example, 
consider the lowest horizontal blue-dashed line; this 
is a facet of the polytope characterized by no mass 
flowing to B from the focal elements {A,C}, {B,C \ 
and {A,B,C}. The masses of {A,C} and {A,B,C} can 
thus be assigned either to A or to C. Assigning both to 
C gives the left end-point of the facet, both to A gives 
the right end-point, and assigning one to A and the other 
to C gives two interior points on the line. 

It is also possible, using linear programming, to 
round downwards to a maximal capacity contained in a 
set. Neither type of rounding is unique, i.e., in general 
there may be several incomparable (by set inclusion) up- 
or down-rounded capacities for a set of distributions. 

5. DECISIONS UNDER UNCERTAINTY AND 
IMPRECISION 

The ultimate use of data fusion is usually decision 
making. Precise Bayesianism results in quantities — 
probabilities of possible worlds — that can be used im- 
mediately for expected utility decision making [49, 4]. 
Suppose the profit in choosing a from a set A of possible 
actions when the world state is A is given by the utility 
function u(a. A) mapping action a and world state A to a 
real valued utility (e.g., dollars). Then the action max- 
imizing expected profit is argmax a J u(a,\)f(X \x)d\. 
In robust Bayesian analysis one uses either minimax 
criteria or estimates a precise probability distribution 
to decide from. Examples of the latter are the pignistic 
and relative plausibility transformations. An example of 
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Fig. 1. Rounding a set of distributions over {A.B.C}. The coordinates are the probabilities of A and B. A set spanned by four corner 
distributions (black solid), its minimal enclosing (blue dashed), and one of its maximal enclosed (red dash-dotted), capacities. 



TABLE I 

Rounding a Convex Set of Distributions Given by its Corners* 



Focal 


Corners 


min 


m 


A 


0.200 


0.222 


0.333 


0.286 


0.200 


0.200 


B 


0.050 


0.694 


0.417 


0.179 


0.050 


0.050 


C 


0.750 


0.083 


0.250 


0.536 


0.083 


0.083 


{A.B} 


0.250 


0.916 


0.750 


0.465 


0.250 


0 


{A,C} 


0.950 


0.305 


0.583 


0.822 


0.305 


0.022 


{B,C} 


0.800 


0.777 


0.667 


0.715 


0.667 


0.534 


{A.B.C} 


1.000 


1.000 


1.000 


1.000 


1.000 


0.111 



‘Corners of the black polygon of Fig. 1 are listed clockwise, starting 
at bottom left. 



a decision-theoretically motivated estimate is the maxi- 
mum entropy estimate, often used in robust probability 
applications [38], This choice can be given a decision- 
theoretic motivation since it minimizes a game-theoretic 
loss function, and can also be generalized to a range 
of loss functions [28]. Specifically, a Decision maker 
must select a distribution q while Nature selects a dis- 
tribution p from a convex set T . Nature selects an out- 
come x according to its chosen distribution p, and the 
decision makers loss is —log q(x). This makes the De- 
cision maker’s expected loss equal to E p {—\ogq(X)}. 
The minimum (over q) of the maximum (over p) ex- 
pected loss is then obtained when q is chosen to be the 
maximum entropy distribution in T. Thus, if this loss 
function is accepted, it is optimal to use the maximum 
entropy transformation for decision making. 

The maximum entropy principle differs significantly 
from the relative plausibility and pignistic transforma- 
tions, since it tends to select a point on the boundary of 
a set of distributions (if the set does not contain the uni- 



form distribution), whereas the pignistic transformation 
selects an interior point. 

The pignistic and relative plausibility transforma- 
tions are linear estimators, by which we mean that they 
are obtained by normalization of a linear function of 
the masses in the DS-structure. If we buy the concept 
of a DS-structure as a set of possible probability distri- 
butions, it would be natural to require that as estimate 
we choose a possible distribution, and then the pignistic 
transformation of Smets gets the edge — it is not difficult 
to prove the following: 

PROPOSITION 1 The pignistic transformation is the only 
linear estimator of a probability distribution from a DS- 
structure that is symmetric over A and always returns 
a distribution in the capacity represented by the DS- 
structure. 

Although we have no theorem to this effect, it seems 
as if the pignistic transformation is also a reasonable 
decision-oriented estimator approximately minimizing 
the maximum Euclidean norm of difference between 
the chosen distribution and the possible distributions, 
and better than the relative plausibility transformation as 
well as the maximum entropy estimate for this objective 
function. The estimator minimizing this maximum norm 
is the center of the smallest enclosing sphere. It will not 
be linear in in, but can be computed with some effort 
using methods presented, e.g., in [23]. The centroid is 
sometimes proposed as an estimator, but it does not 
correspond exactly to any known robust loss function — 
rather it is based on the assumption that the probability 
vector is uniformly distributed over the imprecision 
polytope. 
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The standard expected utility decision rule in pre- 
cise probability translates in imprecise probability to 
producing an expected utility interval for each deci- 
sion alternative, the utility of an action a being given 
by the interval I a = U/eF I M ( a > A)/( A | x)d\. In a refine- 
ment proposed by Voorbraak [61], decision alternatives 
are compared for each pdf in the set of possible pdfs: 

= f u(a,A)f(A | x)d A, for / ef. Decision a is now 
better than decision b if l af > l b , for all f G F. 

Some decision alternatives will fall out because they 
are dominated in utility by others, but in general several 
possible decisions with overlapping utility intervals will 
remain. In principle, if no more information exists, any 
of these decisions can be considered right. But they are 
characterized by larger or smaller risk and opportunity. 

6. ZADEH'S EXAMPLE 

We will now discuss our problem in the context of 
Zadeh's example of two physicians who investigated 
a patient independently — a case prototypical, e.g., for 
the important fusion for target classification problem. 
The two physicians agree that the problem (the diag- 
nosis of the patient) is within the set {M,C,T}, where 
M is Meningitis, C is Concussion and T is brain Tu- 
mor. However, they express their beliefs differently, as 
a probability distribution which is (0.99,0,0.01) for the 
first physician and (0,0.99,0.01) for the second. The 
question is what a third party can say about the patients 
condition with no more information than that given. If 
the two expert opinions are taken as likelihoods, or as 
posteriors with a common uniform prior, this problem 
is solved by taking Laplace’s parallel composition (1) 
of the two probability vectors, giving the result (0,0, 1), 
i.e., the case T is certain. This example has been dis- 
cussed a lot in the literature, see e.g. [53]. It is a classical 
example on how two independent sets of observations 
can together eliminate cases to end up with a case not 
really indicated by any of the two sets in separation. 
Several such examples have been brought up as good 
and prototypical in the Bayesian literature, e.g., in [38]. 
However, in the evidence theory literature the Bayesian 
solution (which is also obtained from using Dempster’ s 
and the Modified Dempster’s rule) has been consid- 
ered inadequate and this particular example has been the 
starting point for several proposals of alternative fusion 
rules. 

The following are reactions I have met from profes- 
sionals — physicians, psychiatrists, teachers and military 
commanders — confronted with similar problems. They 
are also prototypical for current discussions on evidence 
theory. 

• One of the experts probably made a serious mistake. 

• These experts seem not to know what probability zero 
means, and should be sent back to school. 

• It is completely plausible that one eliminated M and 
the other C in a sound way. So T is the main alter- 



native, or rather T or something else, since there are 
most likely more possibilities left. 

• It seems as if estimates are combined at a too coarse 
level: it is in this case necessary to distinguish in A 
between different cases of the three conditions that 
are most likely to effect the likelihoods from observa- 
tions: type, size and position of tumor, bacterial, viral 
or purely inflammatory meningitis, position of con- 
cussion. The frame of discernment should thus not be 
determined solely from the frame of interest, but also 
on what one could call homogeneity of likelihoods or 
evidence. 

• The assessments for T are probably based mostly 
on prior information (rareness) or invisibility in a 
standard MR scan, so the combined judgment should 
not make T less likely, rather the opposite. 

• An investigation is always guided by the patient’s 
subjective beliefs, and an investigation affects those 
beliefs. So it is implausible that the two investigations 
of the same patient are “really” independent. This 
is a possible explanation for the Ulysses syndrome, 
where persons are seen to embark on endless journeys 
through the health care system. This view would 
call for a game-theoretic approach (with parameters 
difficult to assess). 

What the example reactions teach us is that sub- 
jects confronted with paradoxical information typically 
start building their own mental models about the case 
and insist on bringing in more information, in the form 
of information about the problem area, the observation 
protocols underlying the assessments, a new investiga- 
tion, or pure speculation. The professionals handling of 
the information problem is usually rational enough, but 
very different conclusions arise from small differences 
in mental models. This is a possible interpretation of the 
prospect theory of Kahneman and Tversky [40]. 

To sum things up, if we are sure that the experts 
are reliable and have the same definitions of the three 
neurological conditions, the result given by Bayes’ and 
Dempster’s rules are appropriate. If not, the assump- 
tions and hence the statistical model must be modified. 
It seems obvious that the decision makers belief in the 
experts reliability must be explicitly elicited in similar 
situations. 

7. FUSION IN EVIDENCE AND ROBUST BAYESIAN 
THEORY 

The Dempster-Shafer combination rule [51] is a 
straightforward generalization of Laplace’s parallel 
composition rule. By this statement we do not claim 
that this is the way DS theory is usually motivated. 
But the model in which Dempster’s rule is motivated 
[18] is different from ours: there it is assumed that each 
source has its own possible world set, but precise beliefs 
about it. The impreciseness results only from a multi- 
valued mapping, ambiguity in how the information of 
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the sources should be translated to a common frame 
of discernment. It is fairly plausible that the informa- 
tion given by the source is well representable as a DS 
structure interpreted as a capacity. What is much less 
plausible is that the information combined from several 
sources is well captured by Dempster’ s rule rather than 
by the Fixsen/Mahler combination rule or the robust 
combination rule to be described shortly. The precise as- 
sumptions behind Dempster’s rule are seldom explained 
in tutorials and seem not well known, so we recapitulate 
them tersely: It is assumed that evidence comes from a 
set of sources, where source i has obtained a precise 
probability estimate /r over its private frame X r This 
information is to be translated into a common frame A, 
but only a multi-valued mapping I j is available, map- 
ping elements of X i to subsets of A. For the tuple of ele- 
ments x 1 ,...,x„, their joint probability could be guessed 
to be p^x^) ■ ■ ■ p n (x n ), but we have made assumptions 
such that we know that this tuple is only possible if 
r 1 (x 1 )n---nr„(x„) is non-empty. So the probabilities 
of tuples should be added to the corresponding subset of 
A probabilities, and then conditioning on non-emptiness 
should be performed and the remaining subset proba- 
bilities normalized, a simple application of (1). From 
these assumptions Dempster’s rule follows. 

This is postulated by Dempster as the model re- 
quired. One can note that it is not based on inference, but 
derived from an explicit and exact probability model. It 
was claimed incoherent (i.e. violating the consistent bet- 
ting paradigm) by Lindley [42], but Goodman, Nguyen 
and Rogers showed that it is not incoherent [25]. In- 
deed, the assumption of multi-valued mappings seems 
completely innocent, if somewhat arbitrary, and it would 
be unlikely to lead to inconsistencies. The recently in- 
troduced Fixsen/Mahler MDS combination rule [22] in- 
volves a re-weighting of the terms involved in the set in- 
tersection operation: whereas Dempster’s combination 
rule can be expressed as 

m DS (e) oc W|(e 1 )m 2 (e 2 ), e^0 (7) 

e-e j De 2 

the MDS rule is 

m MDS (e)(x Y, m 1 ( e i) m 2 ( e 2 )j~n^~T> e ? 0. 



The MDS rule was introduced to account for non- 
uniform prior information about the world and evidence 
that contains prior information common to all sources. 
In this case \e\, etc, in the formula are replaced by the 
prior probabilities of the respective sets. The rule (8) 
is completely analogous to (4): the denominator of the 
correction term takes the priors out of the posteriors of 
both operands, and the numerator \e\ reinserts it once 
in the result. But as we now will see, the MDS rule 
can also be considered a natural result of fusing likeli- 



hood describing information with a different likelihood 
function. 

It is possible to analyze the source fusion prob- 
lem in a (precise) Bayesian setting. If we model the 
situation with the likelihoods on 2 A x A of (6), Sec- 
tion 4, we find the task of combining the two likelihoods 
'^2 e m l (e)l(e, A) and ^m 0 (e)/(e, A) using Laplace’s par- 
allel composition as in (2) over A, giving 

/(A) OC Y m l( e l') m 2( e 2) l i( e l’Wi( e 2’> 1 ')- 

e v e 2 

For the choice i = 1 , this gives the relative plausibil- 
ity of the result of fusing the evidences with Dempster’s 
rule; for the likelihood l 2 associated with the pignistic 
transformation, we get <? m l {e l )m 2 (e 2 )l(e l ,X)l(e 2 ,X) 
/(|c'i | \e 2 \). This is the pignistic transformation of the 
result of combining m, and m 2 using the MDS rule. 
In the discussions for and against different combina- 
tion and estimation operators, it has sometimes been 
claimed that the estimation operator should propagate 
through the combination operator. This claim is only 
valid if the above indicated precise Bayesian approach is 
bought, which would render DS-structures and convex 
sets of distributions unnecessary. In the robust Bayesian 
framework, the maximum entropy estimate is com- 
pletely kosher, but it does not propagate through any 
well known combination operation. The combination of 
Dempster’ s rule and the pignistic transformation cannot 
easily be defended in a precise Bayesian framework, but 
Dempster’s rule can be defended under the assumption 
of multi-valued mappings and reliable sources, whereas 
the pignistic transformation can be defended in three 
ways: (1) It can be seen as “natural” since it results, e.g., 
from an indifference principle applied to the paramet- 
ric representation of Blackman and Popoli; (2) Smets 
argument [54] is that the estimation operator (e.g., the 
pignistic transformation) should propagate, not through 
the combination operator, but through linear mixing; (3) 
An even more convincing argument would relate to de- 
cisions made, e.g., it seems as if the pignistic transfor- 
mation is, not exactly but approximately, minimizing 
the norm of the maximum (over Nature's choice) er- 
ror made measured as the Euclidean norm of the dif- 
ference between the selected distribution and Nature's 
choice. 

7.1 The Robust Combination Rule 

The combination of evidence — likelihood functions 
normalized so they can be seen as probability distribu- 
tions — and a prior over a finite space is thus done simply 
by component-wise multiplication followed by normal- 
ization [41, 57]. The resulting combination operation 
agrees with the DS and the MDS rules for precise be- 
liefs. The robust Bayesian version of this would replace 
the probability distributions by sets of probability distri- 
butions, for example represented as DS-structures. The 
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most obvious combination rule would yield the set of 
probability functions that can be obtained by taking one 
member from each set and combining them. Intuitively, 
membership means that the distribution can possibly be 
right, and we would get the final result, a set of distri- 
butions that can be obtained by combining a number of 
distributions each of which could possibly be right. The 
combination rule (3) would thus take the form (where 
F denotes convex families of functions): 

F( A | {X U X 2 }) cx F({X„X 2 } | A) x F( A) 

= F(X 1 | A) x F(X 2 | A) x F( A). (9) 

Definition 1 The robust Bayesian combination op- 
erator x combines two sets of probability distribu- 
tions over a common space A. The value of F l x F 2 is 
{c/1/2 : fl e Fl ,/ 2 e F 2 ,c = 1/ Ea 6 a fl (A)/ 2 (A)}. 

The operator can easily be applied to give too much 
impreciseness, for reasons similar to the corresponding 
problem in interval arithmetic: the impreciseness of like- 
lihood functions has typically a number of sources, and 
the proposed technique can give too large uncertainties 
when these sources do not have their full range of varia- 
tion within the evidences that will be combined. A most 
extreme example is the sequence of plots returned by a 
sensor: variability can have its source in the target, in the 
sensor itself, and in the environment. But when a partic- 
ular sensor follows a particular target, the variability of 
these sources are not fully materialized. The variability 
has its source only in the state (distance, inclination, etc) 
of the target, so it would seem wasteful to assume that 
each new plot comes from an arbitrarily selected sensor 
and target. This, and similar problems, are inherent in 
system design, and can be addressed by detailed analy- 
ses of sources of variation, if such are feasible. 

We must now explain how to compute the opera- 
tor of Definition 1. The definition given of the robust 
Bayesian combination operator involves infinite sets in 
general and is not computable directly. For singleton 
sets it is easily computed, though, with Laplace’s par- 
allel composition rule. It is also the case that every cor- 
ner in the resulting set can be generated by combining 
two corners, one from each of the operands. This ob- 
servation gives the method for implementation of the 
robust operator. After the potential corners of the re- 
sult have been obtained, a convex hull computation as 
found, e.g., in MATLAB and OCTAVE, is used to tes- 
sellate the boundary and remove those points falling in 
the interior of the polytope. The figures of this paper 
were produced by a Matlab implementation of robust 
combination, Dempster’s and the MDS rule, maximum 
entropy estimation, and rounding. The state of the art 
in computational geometry software thus allows easy 
and efficient solutions, but of course as the state space 
and/or the number of facets of the imprecision poly- 
topes become very large, some tailored approximation 
methods will be called for. The DS and MDS rules have 
exponential complexity in the worst case. The robust 



rule will have a complexity quadratic in the number of 
corners of the operands, and will thus depend on round- 
ing for feasibility. For very high-dimensional problems 
additional pruning of the corner set will be necessary 
(as is also the case with the DS and MDS operators). 

We can now make a few statements, most of which 
are implicitly present in [19, Discussion by Aitchison] 
and [32], about fusion in the robust Bayesian frame- 
work: 

• The combination operator is associative and commu- 
tative, since it inherits these properties from the mul- 
tiplication operator it uses. 

• Precise beliefs combined gives the same result as 
Dempster’s rule and yield new precise beliefs. 

• A precise belief combined with an imprecise belief 
will yield an imprecise belief in general — thus Demp- 
ster’s rule underestimates imprecision compared to 
the robust operator. 

• Ignorance is represented by a uniform precise belief, 
not by the vacuous assignment of DS-theory. 

• The vacuous belief in the robust framework is a 
belief that represents total skepticism, and will when 
combined with anything yield a new vacuous belief (it 
is thus an absorbing element). This belief has limited 
use in the robust Bayesian context. 

• Total skepticism cannot be expressed with Demp- 
ster’s rule, since it never introduces a focal element 
which is a superset of all focal elements in one 
operand. 

Definition 2 A rounded robust Bayesian combination 
operator combines two sets of probability distributions 
over a common space A. The robust operation is applied 
to the rounded operands, and the result is then rounded. 

An important and distinguishing property of the 
robust rule is: 

OBSERVATION 1 The robust combination operator is, 
and the rounded robust operator can be made (note: it 
is not unique) monotone with respect to imprecision, i.e., 
if F- C F p then F( x F 2 C F, x F 2 . 

PROPOSITION 2 For any combination operator x' that 
is monotone wrt imprecision and is equal to the Bayesian 
(Dempster's) rule for precise arguments, F l x F 2 C F x x 1 
F 2 , where x is the robust rule. 

Proof By contradiction; thus assume there is an / e 
F 1 x F 2 with f (( h\ x' F 2 . By the definition of x, / = 
{/1 } x {f 2 } for some f £ F l and f 2 £ F 2 . But then / = 
{ f \ } x' {f 2 }, and since x' is monotone wrt imprecision, 
/ € Fj x'F 2 , a contradiction. 

We can also show that the MDS combination rule 
has the “nice” property of giving a result that always 
overlaps the robust rule result, under the capacity inter- 
pretation of DS-structures: 

PROPOSITION 3 Let m l and m 2 be two DS-structures and 
let Fj and F 2 be the corresponding capacities. If F is the 
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capacity representing m = m, * MDS ni 2 and F' is F l x F 2 , 
then F and F' overlap. 

Proof Since the pignistic transformation propagates 
through the MDS combination operator, and by Propo- 
sition 1 the pignistic transformation is a member of the 
capacity of the DS-structure, the parallel combination 
of the pignistic transformations of m l and m 2 is a mem- 
ber of F' and equal to the pignistic transformation of 
m, which for the same reason is a member of F. This 
concludes the proof. 

The argument does not work for the original Demp- 
ster’ s rule, for reasons that will become apparent in the 
next section. It was proved by Jaffray [37] that Demp- 
ster’ s rule applied with one operand being precise gives 
a (precise) result inside the robust rule polytope. The 
same holds of course, by Proposition 3, for the MDS 
rule. We can also conjecture the following, based on ex- 
tensive experimentation with our prototype implemen- 
tation, but have failed in obtaining a short convincing 
proof: 

CONJECTURE 1 The MDS combination rule always gives 
a result which is, in the capacity interpretation, a subset of 
the robust rule result. The MDS combination rule is also 
a coarsest symmetric bilinear operator on DS-structures 
with this property. 

8. A PARADOXICAL EXAMPLE 

In [1] we analyzed several versions of Zadeh’s ex- 
ample with ‘‘discounted” evidences to illustrate the dif- 
ferences between robust fusion and the DS and MDS 
rules, as well as some different methods to summarize 
a convex set of pdfs as a precise pdf. Typically, the 
DS and MDS rules give much smaller imprecision in 
the result than the robust rule, which can be expected 
from their behavior with one precise and one imprecise 
operand. One would hope that the operators giving less 
imprecision would fall inside the robust rule result, in 
which case one would perhaps easily find some plausi- 
ble motivation for giving less imprecision than indicated 
in the result. In practice this would mean that a system 
using robust fusion would sometimes find that there is 
not a unique best action while a system based on the 
DS or MDS rule would pick one of the remaining ac- 
tions and claim it best, which is not obviously a bad 
thing. However, the DS, MDS and robust rules do not 
only give different imprecision in their results, they are 
also pairwise incompatible (sometimes having an empty 
intersection) except for the case mentioned in Conjec- 
ture 1 . Here we will concentrate on a simple, somewhat 
paradoxical, case of combining two imprecise evidences 
and decide from the result. 

Varying the parameters of discounting a little in 
Zadeh's example, it is not difficult to find cases where 
Dempster’s rule gives a capacity disjoint (regarded as 
a geometric polytope) from the robust rule result. A 
simple Monte Carlo search indicates that disjointness 



does indeed happen in general, but infrequently. Typ- 
ically, Dempster’s rule gives an uncertainty polytope 
that is clearly narrower than that of the robust rule, 
and enclosed in it. In Fig. 2 we show an example 
where this is not the case. The two combined evi- 
dences are imprecise probabilities over three elements 
A, B and C, the first spanned by the probability distri- 
butions (0.2, 0.2, 0.6) and (0.2, 0.5, 0.3), the second by 
(0.4, 0.1, 0.5) and (0.4, 0.5, 0.1). These operands can be 
represented as DS structures, as shown in Table II, and 
they are shown as vertical green lines in Fig. 2. They 
can be combined with either the DS rule, the MDS rule, 
or the robust rule, as shown in Table III. The situation is 
illustrated in Fig. 2, where all sets of pdfs are depicted 
as lines or polygons projected on the first two proba- 
bilities. The figure shows that the robust rule claims the 
probability of the first event A (horizontal axis) to be 
between 0.2 and 0.33, whereas Dempster’s rule would 
give it an exact probability around 0.157. The MDS 
rule gives a result that falls nicely inside the robust rule 
result, but it claims an exact value for the probability 
of A, namely 0.25. Asked to bet with odds six to one 
on the first event (by which we mean that the total gain 
is six on success and the loss is one on failure), the 
DS rule says decline, the robust and MDS rules say 
accept. For odds strictly between four and five to one, 
the robust rule would hesitate and MDS would still say 
yes. For odds strictly between three and four to one, DS 
and MDS would decline whereas the robust rule would 
not decide for or against. Including the refinement pro- 
posed by Voorbraak (see Section 5) would not alter this 
conclusion unless the imprecisions of the two operands 
were coupled, e.g., by common dependence on a third 
quantity. 

In an effort to reconcile Bayesian and belief meth- 
ods, Blackman and Popoli [8, ch. 7] propose that the 
result of fusion should be given the capacity interpre- 
tation as a convex set, whereas the likelihoods should 
not — an imprecise likelihood should instead be repre- 
sented as the coarsest enclosing DS-structure having the 
same pignistic transformation as the original one. When 
combined with Dempster’s rule, the result is again a 
prior for the next combination whose capacity interpre- 
tation shows its imprecision. The theorem proved — at 
some length — in [8, App. 8A] essentially says that this 
approach is compatible with our robust rule for pre- 
cise likelihoods. In our example, if the second operand 
is coarsened to {m 2 (A) t— > Q.\,m' 2 {{A,B,C}) i— > 0.9}, the 
fusion result will be a vertical line at 0.217, going from 
0.2 to 0.49, just inside the robust rule result. However 
no mass will be assigned to a non-singleton set con- 
taining A, so the rule still gives a precise value to the 
probability of A. The philosophical justification of this 
approach appears weak. 

The example shows that Dempster’ s rule is not com- 
patible with the capacity interpretation, whereas the 
MDS rule is: there is no pair of possible pdfs for the 
operands that combine to any possible value in the 
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Fig. 2. A case where the robust rule and Dempster’s rule give paradoxical results. The coordinates are the probabilities of A and B. The 
operands are shown in green dashed, the result of the robust combination rule is shown in black solid (same as in Fig. 1), Dempster’s rule 
gives the result shown in red dotted, the Fixsen/Mahler MDS rule shown in blue dash-dotted lines. 



Dempster’ s rule result, wheras every possible pdf in the 
MDS rule results from combining some pair of possible 
pdfs for the operands. If Conjecture 1 can be proved, the 
last is true for all pairs of operands, but there are also 
many particular examples where even Dempster’s rule 
gives a compatible result. It has been noted by Walley 
that Dempster’s rule is not the same as the robust combi- 
nation rule [62], but I have not seen a demonstration that 
the two are incompatible in the above sense. There is, of 
course, a rational explanation of the apparent paradox, 
namely that the assumptions of private frames of dis- 
cernment for sources and of a multi-valued mapping for 
each source is very different from the assumption of im- 
precise likelihoods, and this means that some informa- 



TABLE II 

Two Operands of the Paradoxical Example* 



Focal 


op l 


op 2 




c t 


C 2 


m 


c t 


c 2 


m 


A 


0.2 


0.2 


0.2 


0.4 


0.4 


0.4 


B 


0.2 


0.5 


0.2 


0.1 


0.5 


0.1 


C 


0.6 


0.3 


0.3 


0.5 


0.1 


0.1 


{B,C} 






0.3 






0.4 



‘Columns marked m denote DS-structures and those marked c l , c, 
denote corners spanning the corresponding capacity. Values are exact. 



TABLE III 

Fusing the Operands of Table II with the DS, MDS and Robust Rules* 



Focal 


Fusion Result 




DS 


MDS 


Robust 


Uprounded 




c l 


c 2 


m 


c i 


c 2 


m 


C 11 


c 22 


C 12 


C 21 


m 


A 


0.157 


0.157 


0.157 


0.250 


0.250 


0.250 


0.200 


0.222 


0.333 


0.286 


0.200 


B 


0.255 


0.490 


0.255 


0.422 


0.234 


0.234 


0.050 


0.694 


0.417 


0.179 


0.050 


C 


0.588 


0.353 


0.353 


0.328 


0.516 


0.328 


0.750 


0.083 


0.250 


0.536 


0.083 


{A,B} 






0 






0 










0 


{A,C} 






0 






0 










0.022 


{R,C} 






0.235 






0.188 










0.534 


{A.B.C} 




0 




0 




0.111 



*The result for DS and MDS shown as two corners (c, and c 2 ), and as an equivalent DS-structure (m). For the robust rule result, its four 
spanning corners are shown, where, e.g., c 21 was obtained by combining the second corner c 2 of op t with Cj of op 2 , etc. These corners are 
the corners of the black polygon in Fig. 2. The robust rule result is also shown as a DS-structure for the up-rounded result (blue dashed line 
in Fig. 1). Values are rounded to three decimals. 
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tion in the private frames is still visible in the end result 
when Dempster’s rule is used. Thus Dempster’s rule 
effectively makes a combination in the frame 2 A instead 
of in A as done by the robust rule. It is perhaps more 
surprising that the paradoxical result is also obtainable 
in the frame A using precise Bayesian analysis and the 
likelihood / j(e,A) (see Section 4). The main lesson here, 
as in other places, is that we should not use Dempster’s 
rule unless we have reason to believe that imprecision 
is produced by the multi-valued mapping of Dempster’s 
model rather than Fixsen/Mahler’ s model or incomplete 
knowledge of sampling functions and prior. If the MDS 
operator is used to combine likelihoods or a likelihood 
and a prior, then posteriors should be combined using 
the MDS rule (8), but with all set cardinalities squared. 

Excluding Bayesian thinking from fusion may well 
lead to inferior designs. 

9. CONCLUSIONS 

Despite the normative claims of evidence theory and 
robust Bayesianism, the two have been considered dif- 
ferent in their conclusions and general attitude towards 
uncertainty. The Bayesian framework can however de- 
scribe most central features of evidence theory, and is 
thus a useful basis for teaching and comparison of dif- 
ferent detailed approaches to information fusion. The 
teaching aspect is not limited to persuading engineers 
to think in certain ways. For higher level uncertainty 
management, dealing with quantities recognizable to 
users like medical researchers, military commanders, 
and their teachers in their roles as evaluators, the need 
for clarity and economy of concepts cannot be exag- 
gerated. The arguments put forward above suggest that 
an approach based on the precise Bayesian and the ro- 
bust Bayesian fusion operator is called for, and that 
choosing decision methods based on imprecise prob- 
abilities or DS structures should preferably be based on 
decision-theoretic arguments. Our example shows how 
dangerous it can be to apply evidence theory without 
investigating the validity in an application of its crucial 
assumption of reliable private frames for all sources of 
evidence and precise multi-valued mappings from this 
frame to the frame of interest. The robust rule seems 
to give a reasonable fit to most fusion rules based on 
different statistical models, with the notable exception 
of Dempster’s rule. Thus, as long as the capacity inter- 
pretation is prevalent in evidence theory applications, 
there are good reasons to consider if the application 
would benefit from using the MDS rule (complemented 
with priors if available) also for combining information 
in the style of likelihoods. In this case, however, the 
combination of the MDS rule with pignistic transfor- 
mation is interpretable as a precise Bayesian analysis. 
In most applications I expect that the precise Bayesian 
framework is adequate, and it is mainly in applications 
with the taste of risk analysis that the robust Bayesian 
framework will be appropriate. 
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